Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit

Backgrounds The in-hospital mortality in lung cancer patients admitted to intensive care unit (ICU) is extremely high. This study intended to adopt machine learning algorithm models to predict in-hospital mortality of critically ill lung cancer for providing relative information in clinical decision-making. Methods Data were extracted from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) for a training cohort and data extracted from the Medical Information Mart for eICU Collaborative Research Database (eICU-CRD) database for a validation cohort. Logistic regression, random forest, decision tree, light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), and an ensemble (random forest+LightGBM+XGBoost) model were used for prediction of in-hospital mortality and important feature extraction. The AUC (area under receiver operating curve), accuracy, F1 score and recall were used to evaluate the predictive performance of each model. Shapley Additive exPlanations (SHAP) values were calculated to evaluate feature importance of each feature. Results Overall, there were 653 (24.8%) in-hospital mortality in the training cohort, and 523 (21.7%) in-hospital mortality in the validation cohort. Among the six machine learning models, the ensemble model achieved the best performance. The top 5 most influential features were the sequential organ failure assessment (SOFA) score, albumin, the oxford acute severity of illness score (OASIS) score, anion gap and bilirubin in random forest and XGBoost model. The SHAP summary plot was used to illustrate the positive or negative effects of the top 15 features attributed to the XGBoost model. Conclusion The ensemble model performed best and might be applied to forecast in-hospital mortality of critically ill lung cancer patients, and the SOFA score was the most important feature in all models. These results might offer valuable and significant reference for ICU clinicians’ decision-making in advance.


Introduction
Lung cancer is the third most common malignancy and is reported the leading cause of cancer death in males and the second most common cancer in females, which taking up more than one-fifth of all cancer deaths worldwide [1][2][3]. Exceed 158,000 patients died from lung cancer in the United States in 2016, which accounted for 27% of all cancer deaths [4,5], the prognosis remains poor although improvement has been made in the therapy of lung cancer, the 5-year survival rate for all stages combined is only 15% [6,7]. Many lung cancer patients require admitted to intensive care unit (ICU) and respiratory failure requiring mechanical ventilation is the major reason for lung cancer patients being admitted to the ICU [8,9]. Although progressive improvement has been made to improve the prognosis in lung cancer patients admitted to the ICUs, the mortality rate remains extremely high, the mortality rate in lung cancer patients admitted to ICU was 43% and the in-hospital mortality is 60%, and the mortality rate is higher in patients with stage IV (68%) [10]. Currently, the lack of early prediction and risk stratification for in-hospital mortality is the main challenge for ICU clinicians. The decision regarding which groups of lung cancer patients admitted to the ICU at high-risk and would have poor prognosis is based on a complex suite of considerations including therapeutic options and the wishes of patients and their family. These critically ill lung cancer patients usually have poor long-term survival and high financial cost. Hence, it's necessary to explore risk prediction models to distinguish those at high-risk of critically ill lung cancer patients admitted to ICU.
The development of artificial intelligence has led to a significant improvement in the predictive models used for estimating the risk of mortality in cancer patients. Machine learning (ML), a new type of artificial intelligence can transform measurement results into relevant predictive models, especially cancer models, based on the rapid development of large datasets and deep learning. Recently, ML have been shown to be effective in predicting lung cancer susceptibility, recurrence, and survival of malignant tumors [11][12][13]. However, there is still limited data relating to the in-hospital mortality risk prediction models using ML methods in patients with lung cancer in the ICU setting.
Therefore, this study aimed to develop six ML algorithm models including logistic regression, decision tree, random forest, light gradient boosting machine (GBM), extreme gradient boosting (XGBoost), and an ensemble model to predict the in-hospital mortality among lung cancer patients admitted to ICU so that individual prevention strategies for critically ill lung cancer patients could be proposed to help clinicians to make therapeutic decisions. Moreover, we also intended to compared the six ML models and determined the best model for in-hospital mortality prediction in lung patients admitted to the ICU.

Data source
This retrospective study utilized information from the eICU Collaborative Research Database (eICU-CRD) [14] and the Medical Information Mart for Intensive Care-IV (MIMIC-IV version 1.0) database [15], eICU-CRD contains data of more than 200 thousand ICU admissions in 2014 and 2015 at 208 US hospitals while MIMIC-IV includes information of more than 70,000 patients admitted to the ICUs of Beth Israel Deaconess Medical Center in Boston, MA, from 2008 to 2019. Due to the data used in this study were extracted from public databases, it was exempt from the requirement for informed consent from patients and approval of the Institutional Review Board (IRB). All procedures were performed according to the ethical standards of the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. After finishing the web-based training courses (S1 Fig) and the Protecting Human Research Participants examination, we obtained permission to extract data from the eICU-CRD and MIMIC-IV database.

Cohort selection
Patients with one of the following conditions were excluded: (1) less than 18-year-old at first admission to ICU; (2) repeated ICU admissions; (3) more than 80% of personal data was missing. We randomly selected MIMIC-IV database as the training cohort and eICU-CRD database as the validation cohort. A total of 2,638 patients in the MIMIC-IV database assigned into the training cohort and 2,414 patients in the eICU-CRD database assigned into the validation cohort were finally included in this study, the detailed flowchart was shown in Fig 1.

Date collection and outcomes
Baseline characteristics and admission information: age, gender and body mass index (BMI) were calculated as described in previous studies. Comorbidities including hypertension, diabetes, chronic kidney disease, myocardial infarction, congestive heart failure, atrial fibrillation, valvular disease, chronic obstructive pulmonary disease, stroke, hyperlipidemia and liver disease were also collected for analysis based on the recorded ICD codes in the two databases. Charlson comorbidity index (CCI) was also included. In addition, severity scores including sequential organ failure assessment (SOFA) score, the oxford acute severity of illness score (OASIS), the acute physiology score III (APSII) were collected. Acute complications during ICU including acute heart failure, acute respiratory failure, acute hepatic failure and cardiac arrest based on ICD codes, acute kidney injury based on KDIGO guideline in 48 hours [16], sepsis based on sepsis 3.0 criteria [17] were also recorded. In addition, initial vital signs and laboratory results were also measured during the first 24 hours of ICU admission.
The primary outcome was in-hospital mortality.

Statistical analysis
For all continuous covariates, the mean values and standard deviations are reported Categorical data were expressed as frequency (percentage). The Chi-square test or Fisher's test was appropriately performed to compare the differences between groups. The baseline characteristics were reported as a training cohort and validation cohort. The comparison of baseline characteristics was performed in R software (version 4.1.0). P < 0.05 was considered statistically significant. Modeling work were done using Python 3.6.4.

Construction of in-hospital mortality predictive models
Logistic regression, decision tree, random forest, and two gradient boosting decision trees, including LightGBM, and XGBoost, were adopted to construct prediction models. In order to improve prediction, an ensemble model was constructed, which applied staking strategy using random forest, LightGBM and XGBoost [18]. The prediction probabilities of the three models were input into a logistic regression model to produce a final prediction. Hence, six in-hospital mortality predictive models were developed using logistic regression, decision tree, random forest, LightGBM, XGBoost and ensemble models, which each used 100 full features for each time window. Furthermore, the top 10 important features derived from random forest, lightGBM, and XGBoost model were also analysis [18].

Performance evaluation
To evaluate and compare the predictive accuracy of prediction by decision tree, random forest, LightGBM, XGBoost, ensemble model and logistic regression models. Each model was evaluated according to accuracy, recall, F1 score, and AUC (area under the receiver operating characteristic) curve [19].

SHAP analysis
To further analyze the positive or negative effect of the important features identified for inhospital mortality prediction and investigate the relationship between, a shapely additive explanations (SHAP) analysis was performed using Python 3.7.0. The SHAP value is the assigned predicted value of each feature of the data [20].

Baseline characteristics
A total of 5,052 patients were finally included in the present study, including 2,638 patients in the training cohort extracted from the MIMIC-IV database and 2,414 patients in the validation cohort extracted from the eICU-CRD database. There were 653 (24.8%) in-hospital death in the training cohort, and 523 (21.7%) in-hospital death in the validation cohort. Table 1 showed the baseline characteristics both in the training cohort and in the validation cohort.

Model performance
Six models, logistic regression, decision tree, random forest, LightGBM, XGBoost, and ensemble models were used to predict in-hospital mortality using all the features. As can been seen in Table 2, the traditional model logistic regression exhibited the worst predictive ability, followed by decision tree, random forest, XGBoost, LightGBM. And the ensemble model showed the best predictive ability with the highest accuracy (0.89), recall (0.80), F1 score (0.82) and AUC (0.92) in training cohort. And the results in the validation cohort similar to the results in the training cohort ( Table 2). In addition, we also performed ROC analysis to further confirm the in-hospital mortality predictive ability of these six models, as shown in Fig 2A and 2B, the logistic regression model depicted the worst predictive ability, followed by decision tree, random forest, XGBoost, LightGBM. And the ensemble model showed the best predictive performance both in the training cohort and in the validation cohort.

Feature importance analysis
To clarify the important features that impacts on model output, the feature importance analysis was conducted. The top 15 features derived from random forest, lightGBM, and XGBoost model were shown in Fig 3. In random forest model, SOFA score was the most influential feature, followed by albumin, OASIS score, anion gap, billirubin, mechanical ventilation, acute respiratory failure, APSIII score, length of hospital, BUN, WBC, respiratory rate, vasopressors usage and RDW, and these features also had important on random forest model ( Fig 3A). For lightGBM model, anion gap played the most important role in prediction in-hospital mortality, moreover, SOFA score, OASIS score, albumin, length of hospital, billirubin, WBC, platelet, BNU, heart rate, MCH, APSIII score, creatinine and MCV also plays important role in prediction ( Fig 3B). Furthermore, in terms of XGBoost model, SOFA score had the most influence on in-hospital mortality prediction, followed by anion gap, billirubin, OASIS score, albumin, white blood cell, bicarbonate, length of hospital, acute respiratory failure, RDW, temperature, creatinine, platelet, MCHC and BMI ( Fig 3C). Moreover, the feature importance analysis derived from random forest, lightGBM, and XGBoost model were also conducted in validation cohort in S2-S4 Figs. And the results were coincided with the result of the training cohort.

SHAP analysis
In order to manifest an overall positive or negative impact on model output, and to analyze the similarities and differences of important characteristics of critically ill lung cancer with different severities, the SHAP summary chart was used. As shown in Fig 4, SOFA score ranked the first in importance among the top 20 features of the XGBoost model, and the higher the SOFA score, the higher probability of in-hospital mortality development, indicating that SOFA score should be observed first in in-hospital mortality prediction. Taking the XGBoost model with excellent performance for predicting dead/survival using all features as an example, combined with the SHAP analysis method, a representative dead patient and a survival patient were selected to illustrate the effect of features on the prediction ability. As shown in Fig 5, for predicting dead patients, SOFA score plays a major positive role in the prediction results, the SHAP value of final model predicted for this patient is 0.96, which is beyond than 0, thus successfully predicting the patient as an in-hospital died patient. For predicting survival patients, anion gap plays a major positive role in the prediction results, SOFA score played a major negative role in predicting outcomes, the SHAP value of final model predicted for this patient is -1.23, which is less than 0, thus successfully predicting the survival patient.

Discussion
In this retrospective study, we developed and validated machine learning algorithms based on clinical features based on largely public database MIMIC-IV and eICU-CRD, to predict inhospital mortality of critically ill lung cancer patients. The lightGBM model exhibited the best performance for single model prediction, whereas the RF + ensemble model an ensemble model was constructed, which applied staking strategy using random forest, LightGBM and XGBoost exhibited the greatest AUC among the models we tested. Using advanced machine learning techniques, we could identify some important clinical features associated with in-hospital mortality such as SOFA score, anion gap, albumin, OASIS score and acute respiratory failure. These results have some implications and require further consideration. ICU-related in-hospital mortality for lung cancer is ranked highest among the solid tumors and the in-hospital mortality in lung cancer patients admitted to ICU is discrepancy according to the lung cancer stage. Previous studies reported that the ICU mortality of extensive or advanced lung cancer patients over 50%. Park et al. investigated patients in Korea who had been newly diagnosed with lung cancer between 2008 and 2010 and indicated that the in-hospital mortality was 58.3% in those advanced critically ill lung cancer patients [21]. In addition, Song et al. analyzed the advanced lung cancer patients, including stage IIIB or IV non-small cell lung cancer and extensive-stage small cell lung cancer, admitted to the ICU and found before and after 2011, the in-hospital mortality was 82.4% and 65.9% [22]. In this study, our result manifested a similar result to Adam et al. [23] report a 20% in-hospital mortality rate in stage I non-small cell lung cancer. This maybe due to the vast majority of the type of the lung cancer were primary but not metastatic, so the in-hospital mortality in the present study is lower than those with advanced critically ill lung cancer patients. Unfortunately, it is difficult for clinicians to identify patients at high risk of in-hospital death in the ICU. Therefore, developing and promoting reliable prediction models is particularly urgent for identifying these patients and providing them with timely and effective interventions to improve their prognosis.
Currently, given the increasing applicability and effectiveness of supervised machine learning algorithms in predictive disease modeling, the breadth of research seems to progress   PLOS ONE [24,25]. The well-known supervised learning classifiers, including support vector machine, random forest, convolutional neural network, and decision tree, have been gradually applied to clinical practice [26,27]. With the help of machine learning classification, it showed that the machine learning-assisted decision-support model has more advantages than the traditional linear regression model. In this study, we used six different machine learning methods (logistic regression, decision tree, random forest, LightGBM, XGBoost, and ensemble models) to build predictive models. Four popular metrics (ROC, F1 score, accuracy and recall) were used to evaluate the performance of these algorithms. There is no doubt that the results showed that the ensemble model (which combined random forest, LightGBM and XGBoost) achieved the best performance and predictive stability, which was consistent with previous reported [18]. Apart from this, lightGBM model achieved the best predictive performance. The lightGBM modeling is a novel technique that has been widely adopted in tumors survival prediction but not been widely adopted in critical care research [28,29]. Otaguro et al. evaluated data from patients who underwent intubation for respiratory failure and received mechanical ventilation in ICU and use three learning algorithms (Random Forest, XGBoost, and LightGBM) to predict successful extubation, the result demonstrated that lightGBM exhibited the best overall performance [30]. Moreover, Yang et al. adopted nine machine learning models to predict inhospital mortality in critically ill patients with hypertension and found that among nine machine learning models, the lightGBM model had the best predictive ability [31].
We employed visualization function in SHAP to find the effect of the specific value of each variable on model output. There are some factors contributing most including SOFA score, anion gap, albumin and so on. SOFA score is an useful tool to quantify the degree of organ dysfunction or failure present on ICU admission which has been widely used for in-hospital mortality prediction in the ICU settings [32][33][34][35]. And SOFA score was reported to exhibit better performance than other score systems in predicting infection-related in-hospital mortality in ICU patients, the higher the SOFA score, the higher the risk of in-hospital mortality [36]. Anion gap (AG) is commonly used to classify acid-base disorders and to diagnose various conditions. Recently, AG has been reported to associated with in-hospital mortality in ICU patients. Hu et al. indicated that AG was related to in-hospital mortality in intensive care patients with sepsis [37]. Moreover, Chen et al. demonstrated that AG could significantly predict ICU mortality for aortic aneurysm patients [38]. Hypoalbuminemia is almost associated with worse prognosis. And low albumin level was usually related to higher risk of in-hospital mortality in ICU settings [39]. Moreover, SHAP force plots of a dead and a survival patient (Fig 5) were selected to further verify the effect of features on the prediction ability and the results further confirmed the SOFA score, anion gap, albumin, etc. features have positive or negative effect on the output of these predictive models.
We should acknowledge some limitations of this research. First, the retrospective and observational nature of our study may lead to inevitable selection bias. Second, the data used in this study were based on public databases MIMIC-IV and eICU-CRD, an external validation is required to prevent overfitting. Third, the data did not include any information on the pathologic and radiologic finding of lung cancer. We could not differentiate between small cell carcinoma and non-small cell carcinoma, the algorithm model is skewed because important medical information about molecular diagnosis.

Conclusions
In the present study, we applied six machine learning methods to predict in-hospital mortality in critically ill lung cancer patients. We demonstrated that the ensemble model achieved the best predictive performance and the lightGBM model exhibited the best performance for single model prediction. And the SOFA score, anion gap and albumin are the most important factors which impacted on the output of the machine learning models in predicting in-hospital mortality of critically ill patients with lung cancer. Our study obtained clinical feature interpretations to provide clinicians in ICU with some information for reference in clinical prognosis prediction.

Ethics approval and consent to participate
The study was ethically approved by an affiliated of the Massachusetts Institute of Technology (No.27653720). All patients-related information in the database is anonymous, so there is no need to obtain the informed consent of the patients. This study is described in conformity to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement, and was managed to conform to the tenets of the Declarations of Helsinki.